In the following visualizations, LDA models have been trained with the same content, but split in documents of various sizes. In this case, the number of topics is always 80, the implementation used is from the gensim module. The documents in the first case have been split according to the paragraphs found in the original source.
import ktm_prepviz, pyLDAvis
#default for mds is mmds, tsne is also possible. pcoa is not recommended since it gives sometimes errors in the returned JSON
vis = ktm_prepviz.prepviz("doclength", mds="tsne")
There are various distance measures, in this case dimension reduction via Jensen-Shannon Divergence & Metric Multidimensional Scaling has been used
pyLDAvis.display(vis[0])
# Variable length of documents
pyLDAvis.display(vis[1])
# Documents with average length of around 25 tokens
pyLDAvis.display(vis[2])
# Documents with average length of around 100 tokens
pyLDAvis.display(vis[3])
# Documents with average length of around 500 tokens
pyLDAvis.display(vis[4])
# Documents with average length of around 1000 tokens